jq for datasets

Fast, flexible data manipulation across multiple file formats using familiar jq-like syntax

Web Demo View on GitHub Get Started

Key Features

High Performance

Built on Polars DataFrames with lazy evaluation and columnar operations for lightning-fast data processing

Format Flexibility

Supports Parquet, Avro, CSV, JSON Lines, Arrow, and more with automatic format detection

User-Friendly

Intuitive jq-inspired syntax with interactive REPL mode and clear error messages

Supported Formats

CSV/TSV Parquet JSON/JSON Lines Arrow Avro ASCII Delimited Text

Perfect For

Data Analysts

Quick data exploration and transformation

Developers

Pipeline integration and data processing

Data Engineers

ETL workflows and format conversion

Researchers

Dataset analysis and manipulation

Get Started Today

Install via cargo:


$ cargo install dsq-cli

Or download pre-compiled binaries from GitHub

Transform data

# Input: employees.csv
# id,name,age,city,salary,department
# 1,Alice Johnson,28,New York,75000,Engineering
# 2,Bob Smith,34,Los Angeles,82000,Sales
# ...

$ dsq 'map(.salary += 5000) | map({name, new_salary: .salary, department})' employees.csv

[
  {"department": "Engineering", "name": "Alice Johnson", "new_salary": 80000},
  {"department": "Sales", "name": "Bob Smith", "new_salary": 87000},
  {"department": "Marketing", "name": "Carol Williams", "new_salary": 73000},
  ...
]

Group and aggregate

# Input: employees.csv

$ dsq 'group_by(.department) | map({dept: .[0].department, count: length, avg_salary: (map(.salary) | add / length)})' employees.csv

[
  {"avg_salary": 90666.67, "count": 3, "dept": "Engineering"},
  {"avg_salary": 63500.0, "count": 2, "dept": "HR"},
  {"avg_salary": 83000.0, "count": 3, "dept": "Sales"},
  {"avg_salary": 69500.0, "count": 2, "dept": "Marketing"}
]

Group with statistics

# Input: books.csv
# title,author,year,genre,price
# "The Great Gatsby",F. Scott Fitzgerald,1925,Fiction,10.99
# "1984",George Orwell,1949,Dystopian,9.99
# ...

$ dsq 'group_by(.genre) | map({genre: .[0].genre, count: length, avg_price: (map(.price) | add / length)})' books.csv

[
  {"avg_price": 11.58, "count": 3, "genre": "Fiction"},
  {"avg_price": 9.75, "count": 2, "genre": "Dystopian"},
  {"avg_price": 14.99, "count": 1, "genre": "Fantasy"},
  ...
]

Convert formats

# CSV to Parquet
$ dsq '.' data.csv -o output.parquet

# JSON to CSV
$ dsq '.' data.jsonl -o output.csv

# Parquet to JSON
$ dsq '.' data.parquet -o output.json

Cross-platform support for Linux, macOS, and Windows